-
Notifications
You must be signed in to change notification settings - Fork 816
20190117 prom rules endpoints #1999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
20190117 prom rules endpoints #1999
Conversation
d198452
to
ad00183
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @jtlisi ! It's the first time I'm seeing the ruler internals, so please be patient going through the comments I've left (including some questions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jacob for addressing my feedback. I've done a second pass review and started doing some manual tests, and I've the feeling this PR is not production-grade yet. I would prefer if, after addressing the comments, you could do extensive manual tests before asking for the next review.
Could you also rebase this PR, please?
pkg/ruler/api.go
Outdated
|
||
// RegisterRoutes registers the configs API HTTP routes with the provided Router. | ||
func (r *Ruler) RegisterRoutes(router *mux.Router) { | ||
if r.store == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a real case? I can't see how it could happen.
pkg/ruler/mock_store.go
Outdated
@@ -4,6 +4,7 @@ import ( | |||
"context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small things:
- We usually name such files with the
_mock
suffix (instead of prefix) - In order to have this file compiled only with tests, you could add a
_test.go
suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to change it to store_mock_test.go lol
pkg/ruler/ruler.go
Outdated
interval := group.Interval() | ||
groupDesc := &rules.RuleGroupDesc{ | ||
Name: group.Name(), | ||
Namespace: group.File(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- Why is it called
Namespace
if it then maps theFile
in the API response (seeapi.go
)? - Is it really a good thing exposing the path of the file inside Cortex? Looks we may disclose a sensitive information in a Cortex as a service scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I updated the downstream path to strip the internal cortex file path. The only thing that will be left is the namespace of the users rule group.
The reason Namespace was chosen instead of file is basically to avoid have file be an API object when interacting with Cortex. It's more semantics than anything else, however I think it is good to conceptually differ between the two.
I mention it here: https://docs.google.com/document/d/1gNKiC5L_zWQ4GGgtdhJJ6FnaZXyJJBvdGTUmF0wgEFg/edit?usp=sharing
@pracucci I'm only part way done refactoring this. I'll try to get it ready for another pass by Monday |
cb310ec
to
8ecebe6
Compare
@pracucci This should be ready for another look. The scope of this PR broadened a bit while getting the tests set up. The ruler and mapper are now explicitly passed a logger upon instantiation. Also there was some updates to the fixtures to make sure the ruler loaded it's rules before the tests are run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jtlisi for addressing my feedback. I kindly ask you a bit more patience to take a look at the last comments. There are many small things, except a couple of critical ones (wrong API path and missing return err
in initRuler()
) 🙏
CHANGELOG.md
Outdated
@@ -6,6 +6,7 @@ | |||
* [CHANGE] Moved `--store.min-chunk-age` to the Querier config as `--querier.query-store-after`, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config option `ingestermaxquerylookback` has been renamed to `query_ingesters_within` to match its CLI flag. #1893 | |||
* `--store.min-chunk-age` has been removed | |||
* `--querier.query-store-after` has been added in it's place. | |||
* [FEATURE] Added flag `experimental.ruler.enable-api` to enable the ruler api which implements the Prometheus API `/api/v1/rules` and `/api/v1/alerts` endpoints. #1999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which implements the Prometheus API
/api/v1/rules
and/api/v1/alerts
endpoints
True, but we export them under /api/prom/...
. I would mention that as well, to make it more clear to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make it clear it is prefixed with the configured -http.prefix
.
pkg/cortex/modules.go
Outdated
@@ -456,7 +461,7 @@ func (t *Cortex) initAlertmanager(cfg *Config) (err error) { | |||
|
|||
// TODO this clashed with the queirer and the distributor, so we cannot | |||
// run them in the same process. | |||
t.server.HTTP.PathPrefix("/api/prom").Handler(middleware.AuthenticateUser.Wrap(t.alertmanager)) | |||
t.server.HTTP.PathPrefix(cfg.HTTPPrefix).Handler(middleware.AuthenticateUser.Wrap(t.alertmanager)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to mention it in the changelog as a separate entry? I guess it's an [ENHANCEMENT]
. Not sure, just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure when HTTPPrefix was added. I think the best thing to do would be to create a small PR that ensures everything is configured using the Prefix and adding an [ENHANCEMENT]
to the change log clarifying it then. I'll revert the prefix for anything unrelated to the ruler in this PR for now.
pkg/ruler/api.go
Outdated
"github.com/cortexproject/cortex/pkg/util" | ||
) | ||
|
||
// RegisterRoutes registers the configs API HTTP routes with the provided Router. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registers the configs API
> registers the ruler API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/cortex/modules.go
Outdated
@@ -415,11 +415,16 @@ func (t *Cortex) initRuler(cfg *Config) (err error) { | |||
cfg.Ruler.Ring.ListenPort = cfg.Server.GRPCListenPort | |||
queryable, engine := querier.New(cfg.Querier, t.distributor, t.store) | |||
|
|||
t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor) | |||
t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor, prometheus.DefaultRegisterer, util.Logger) | |||
if err != nil { | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
pkg/ruler/ruler.go
Outdated
|
||
var groups []*promRules.Group | ||
r.userManagerMtx.Lock() | ||
mngr, exists := r.userManagers[userID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could compact it into the following, which also reduce the scope of variables:
if mngr, exists := r.userManagers[userID]; exists {
groups = mngr.RuleGroups()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/ruler/ruler.go
Outdated
} | ||
|
||
func (r *Ruler) getLocalRules(userID string) ([]*rules.RuleGroupDesc, error) { | ||
groupDescs := []*rules.RuleGroupDesc{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this line after the locked block, you can do groupDescs := make([]*rules.RuleGroupDesc{}, 0, len(groups))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
continue | ||
} | ||
} | ||
r.syncManager(ctx, user, filteredGroups) | ||
} | ||
|
||
// Check for deleted users and remove them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to myself: this block hasn't changed, except the logger reference.
pkg/ruler/ruler.go
Outdated
} | ||
|
||
if update { | ||
level.Info(r.logger).Log("msg", "updating rules", "user", "user") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info or debug level? Up to you, just wanted to raise it, cause may be a bit verbose for an Info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an update would be a relatively rare event? Since we have a metric I'm ok with lowering it to debug though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, LGTM. I haven't manually tested getShardedRules()
, so I do expect you will do it before merging. I left few more nits, but they're all trivial, so no need to re-check it from my side. However, please re-request me a review if something more consistent will change 🙏
State: rl.GetState(), | ||
Name: rl.GetAlert(), | ||
Query: rl.GetExpr(), | ||
Duration: rl.For.Seconds(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as it's not a breaking change, LGTM.
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
1009efa
to
ecf1860
Compare
What this PR does:
This PR implements the Prometheus API alerts and rules endpoints in ruler. It adds a GRPC service to the ruler to allow for an implementation that works when rules are sharded across multiple rulers.
Which issue(s) this PR fixes:
Fixes #1720
Checklist
CHANGELOG.md